Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols

نویسندگان

  • George Bosilca
  • Aurelien Bouteiller
  • Thomas Hérault
  • Pierre Lemarinier
  • Jack J. Dongarra
چکیده

With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols is the performance penalty on communications during failure-free periods, mostly coming from the payload copy introduced for each message. In this paper, we investigate different approaches for logging payload and compare their impact on network performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energy-Aware Probabilistic Epidemic Forwarding Method in Heterogeneous Delay Tolerant Networks

Due to the increasing use of wireless communications, infrastructure-less networks such as Delay Tolerant Networks (DTNs) should be highly considered. DTN is most suitable where there is an intermittent connection between communicating nodes such as wireless mobile ad hoc network nodes. In general, a message sending node in DTN copies the message and transmits it to nodes which it encounters. A...

متن کامل

Improving Message Logging Protocols Scalability through Distributed Event Logging

Message logging is an attractive solution to provide fault tolerance for message passing applications because it is more scalable than coordinated checkpointing. Sender-based message logging is a well known optimization that allows to save messages payload in the sender memory and so only the events corresponding to message receptions have to be logged reliably using an event logger. In existin...

متن کامل

The Cost of Recovery in Message Logging Protocols

ÐPast research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex trade-off when choosing a message logging protocol for fault to...

متن کامل

Lightweight Message Logging Protocol for Distributed Sensor Networks

Among a lot of rollback-recovery protocols developed for providing fault-tolerance for long-running distributed applications, sender-based message logging with checkpointing is one of the most lightweight fault-tolerance techniques to be capable of being applied in this field, significantly decreasing high failure-free overhead of synchronous logging by using message sender's volatile memory as...

متن کامل

The Relative Overhead of Piggybacking in Causal Message Logging Protocols

Message logging protocols ensure that crashed processes make the same choices when re-executing nondeterministic events during recovery. Causal message logging protocols achieve this by piggybacking the results of these choices (called determinants) on the ambient message traffic. By doing so, these protocols do not create orphan processes nor introduce blocking in failure-free executions. To s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010